consumer device
OpenAI Just Released Its First Open-Weight Models Since GPT-2
OpenAI just dropped its first open-weight models in over five years. The two language models, gpt-oss-120b and gpt-oss-20b, can run locally on consumer devices and be fine-tuned for specific purposes. For OpenAI, they represent a shift away from its recent strategy of focusing on proprietary releases, as the company moves towards a wider, and more open, group of AI models that are available for users. "We're excited to make this model, the result of billions of dollars of research, available to the world to get AI into the hands of the most people possible," said OpenAI CEO Sam Altman in an emailed statement. Both gpt-oss-120b and gpt-oss-20b are officially available to download for free on Hugging Face, a popular hosting platform for AI tools.
PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices
Liu, Yangyijian, Li, Jun, Li, Wu-Jun
The high memory and computation demand of large language models (LLMs) makes them challenging to be deployed on consumer devices due to limited GPU memory. Offloading can mitigate the memory constraint but often suffers from low GPU utilization, leading to low inference efficiency. In this work, we propose a novel framework, called pipelined offloading (PIPO), for efficient inference on consumer devices. PIPO designs a fine-grained offloading pipeline, complemented with optimized data transfer and computation, to achieve high concurrency and efficient scheduling for inference. Experimental results show that compared with state-of-the-art baseline, PIPO increases GPU utilization from below 40% to over 90% and achieves up to 3.1$\times$ higher throughput, running on a laptop equipped with a RTX3060 GPU of 6GB memory.
SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices
As large language models gain widespread adoption, running them efficiently becomes a crucial task. Recent works on LLM inference use speculative decoding to achieve extreme speedups. However, most of these works implicitly design their algorithms for high-end datacenter hardware. In this work, we ask the opposite question: how fast can we run LLMs on consumer machines? Consumer GPUs can no longer fit the largest available models and must offload them to RAM or SSD.
Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks
Democratization of AI is an important topic within the broader topic of the digital divide. This issue is relevant to LLMs, which are becoming popular as AI co-pilots but suffer from a lack of accessibility due to high computational demand. In this study, we evaluate whether quantization is a viable approach toward enabling LLMs on generic consumer devices. The study assesses the performance of five quantized code LLMs in Lua code generation tasks. To evaluate the impact of quantization, the models with 7B parameters were tested on a consumer laptop at 2-, 4-, and 8-bit integer precisions and compared to non-quantized code LLMs with 1.3, 2, and 3 billion parameters. Lua is chosen as a low-level resource language to avoid models' biases related to high-resource languages. The results suggest that the models quantized at the 4-bit integer precision offer the best trade-off between performance and model size. These models can be comfortably deployed on an average laptop without a dedicated GPU. The performance significantly drops at the 2-bit integer precision. The models at 8-bit integer precision require more inference time that does not effectively translate to better performance. The 4-bit models with 7 billion parameters also considerably outperform non-quantized models with lower parameter numbers despite having comparable model sizes with respect to storage and memory demand. While quantization indeed increases the accessibility of smaller LLMs with 7 billion parameters, these LLMs demonstrate overall low performance (less than 50\%) on high-precision and low-resource tasks such as Lua code generation. While accessibility is improved, usability is still not at the practical level comparable to foundational LLMs such as GPT-4o or Llama 3.1 405B.
A Hybrid Future for AI
Nvidia's rise to a 2-trillion valuation at the beginning of 2024 underscored the extraordinary computing demands of artificial intelligence systems that power ChatGPT and a host of other cloud services that create videos, music, and computer programs on demand. The power of computing and memory scaling has provided much of the impetus behind the surge in interest in generative AI based on large language models (LLMs). As models get bigger they seem to harness emergent behavior, making them more useful. But, as the growth in parameter counts has easily outstripped Moore's Law, such scaling comes at a high cost. Much of the concern around resource usage has been focused on the enormous arrays of graphics processing units (GPUs) and accelerators in training grids used to train models for weeks at a time.
What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI
If you can't tell by now, just about every tech company is eager to pray at the altar of AI, for better or worse. Google's recent I/O developer conference was dominated by AI features, like its seemingly life-like Project Astra assistant. Just before that, OpenAI debuted GPT 4o, a free and conversational AI model that's disturbingly flirty. Next up is Microsoft Build 2024, the company's developer conference that's kicking off next week in Seattle. Normally, Build is a fairly straightforward celebration of Microsoft's devotion to productivity, with a dash of on-stage coding to excite the developer crowd.
Radair.io Launches First Consumer Device - The Radair Mini Gateway
DBA Radair.io, a provider of high-grade, multi-protocol IoT (Internet-of-Things) devices, and IoT-based enterprise solutions to drive operational efficiencies across select industries, is pleased to announce the launch of its first consumer device, the Radair Mini Gateway. The Radair Mini Gateway will support multiple ecosystems, including Helium (upon HIP19 approval) as well as The Radair Foundation's forthcoming ecosystem. The Mini Gateway will also be the industry's first light gateway with environmental monitoring that detects volatile organic compounds (VOCs), volatile sulfur compounds (VSCs), carbon monoxide, smoke, pollutants, and various gasses, with insights powered by an embedded 4-in-1 Bosch sensor. The Mini Gateway includes Wi-Fi 6E, GPS, and a barometric pressure (altitude) sensor to future-proof against changes in earning protocols, alongside the industry-leading LoRa concentrator, all elegantly built into a single device. US-based team, and an industry-leading warranty, the Radair Mini Gateway sets the new standard for IoT miners.
'Smart' To 'AI' Paradigm Shift In Edge Computing
Uniquify, a Silicon Valley neural network technology and AI edge computing company, is announcing a proprietary neural network and AI modeling technology that introduces a new paradigm to transition consumer smart devices to consumer AI devices. The bottleneck to adopting advanced AI technology isn't the AI models or platforms but how to economically deploy these complex AI models for consumers at the edges. Uniquify's neural network 2.0 and AI modeling technology will enable many consumer products to become AI devices so that consumers can benefit from advanced AI models while protecting their privacy by running services at the edges. "We have seen many consumer devices like the phone, car, and TV go through a'smart' paradigm shift in the past few decades," says Josh Lee, CEO of Uniquify. "The world is ready for an'AI' paradigm shift to trigger replacement cycles in those consumer industries and more. I believe today's advanced AI models can be grafted into numerous consumer devices to provide richer experiences and enhanced capabilities for consumers. We believe we are ready to kickstart the'smart' to'AI' paradigm shift with our proprietary Neural Network 2.0 and AI modeling technology."
What are the upcoming policies that will shape AI – and are policymakers up to the task?
As vice president and director of governance studies at the Brookings Institution, and a senior fellow at its Center for Technology Innovation, Darrell M. West spends a lot of time thinking about the intersection of policy and emerging tech. In his recent book, Turning Point: Policymaking in the Era of Artificial Intelligence, co-authored with Brookings President John R. Allen, West looks at AI use cases – "from self-driving cars to e-commerce algorithms that seem to know what you want to buy before you do" – and assesses where they're headed and how they will be shaped by policy decisions made today. The key challenge – not least in healthcare, where patient safety is paramount – is to devise regulatory guardrails that maximize the benefits of AI and machine learning and minimize their potentially dangerous downsides. In the book, West and Allen offer a series of recommendations – bolstering governmental oversight, creating new specialized advisory boards at federal agencies, third-party auditing to sniff out algorithmic bias and more. At the upcoming HIMSS Machine Learning & AI for Healthcare event, West will offer a presentation titled "The Latest Regulatory Developments Impacting Machine Learning and AI in Healthcare," where he'll explore potential new policy shifts around clinical uses of artificial intelligence: algorithmic bias, remote patient monitoring, patient safety, fitness trackers and more.
Artificial Intelligence: What is it in reality ?
Artificial intelligence (AI) is the simulation of human actions and intelligence by computers. It is a combination of many technologies such as Machine Learning, Natural Language Processing and Applied Intelligence. Reactive machines don't have the ability to learn and adapt, hence they are not used for memory based scenarios and can be used for automatic responses to a limited set of inputs. Limited memory machines are capable of learning from historical data and make decisions. They use deep learning techniques for training and storing memory for these machines.